Goto

Collaborating Authors

 random forest fail


When do random forests fail?

Neural Information Processing Systems

Random forests are learning algorithms that build large collections of random trees and make predictions by averaging the individual tree predictions. In this paper, we consider various tree constructions and examine how the choice of parameters affects the generalization error of the resulting random forests as the sample size goes to infinity. We show that subsampling of data points during the tree construction phase is important: Forests can become inconsistent with either no subsampling or too severe subsampling. As a consequence, even highly randomized trees can lead to inconsistent forests if no subsampling is used, which implies that some of the commonly used setups for random forests can be inconsistent. As a second consequence we can show that trees that have good performance in nearest-neighbor search can be a poor choice for random forests.


Reviews: When do random forests fail?

Neural Information Processing Systems

This is a fairly serious omission, and casual readers would remember the wrong conclusions. This must be fixed for publication, but I think it would be straightforward to fix. Officially, NIPS reviewers are not required to look at the supplementary material. Because of having only three weeks to review six manuscripts, I was not able to make the time during my reviewing. So I worry that publishing this work would mean publishing results without sufficient peer review. DETAILED COMMENTS * p. 1: I'm not sure it is accurate to say that deep, unsupervised trees grown with no subsampling is a common setup for learning random forests. It appears in Geurts et al. (2006) as a special case, sometimes in mass estimation [1, 2], and sometimes in Wei Fan's random decision tree papers [3-6]. I don't think these are used very much.


When do random forests fail?

Tang, Cheng, Garreau, Damien, Luxburg, Ulrike von

Neural Information Processing Systems

Random forests are learning algorithms that build large collections of random trees and make predictions by averaging the individual tree predictions. In this paper, we consider various tree constructions and examine how the choice of parameters affects the generalization error of the resulting random forests as the sample size goes to infinity. We show that subsampling of data points during the tree construction phase is important: Forests can become inconsistent with either no subsampling or too severe subsampling. As a consequence, even highly randomized trees can lead to inconsistent forests if no subsampling is used, which implies that some of the commonly used setups for random forests can be inconsistent. As a second consequence we can show that trees that have good performance in nearest-neighbor search can be a poor choice for random forests.